142 research outputs found
The Tree Inclusion Problem: In Linear Space and Faster
Given two rooted, ordered, and labeled trees and the tree inclusion
problem is to determine if can be obtained from by deleting nodes in
. This problem has recently been recognized as an important query primitive
in XML databases. Kilpel\"ainen and Mannila [\emph{SIAM J. Comput. 1995}]
presented the first polynomial time algorithm using quadratic time and space.
Since then several improved results have been obtained for special cases when
and have a small number of leaves or small depth. However, in the worst
case these algorithms still use quadratic time and space. Let , , and
denote the number of nodes, the number of leaves, and the %maximum depth
of a tree . In this paper we show that the tree inclusion
problem can be solved in space and time: O(\min(l_Pn_T, l_Pl_T\log
\log n_T + n_T, \frac{n_Pn_T}{\log n_T} + n_{T}\log n_{T})). This improves or
matches the best known time complexities while using only linear space instead
of quadratic. This is particularly important in practical applications, such as
XML databases, where the space is likely to be a bottleneck.Comment: Minor updates from last tim
Controlled non uniform random generation of decomposable structures
Consider a class of decomposable combinatorial structures, using different
types of atoms \Atoms = \{\At_1,\ldots ,\At_{|{\Atoms}|}\}. We address the
random generation of such structures with respect to a size and a targeted
distribution in of its \emph{distinguished} atoms. We consider two
variations on this problem. In the first alternative, the targeted distribution
is given by real numbers \TargFreq_1, \ldots, \TargFreq_k such that 0 <
\TargFreq_i < 1 for all and \TargFreq_1+\cdots+\TargFreq_k \leq 1. We
aim to generate random structures among the whole set of structures of a given
size , in such a way that the {\em expected} frequency of any distinguished
atom \At_i equals \TargFreq_i. We address this problem by weighting the
atoms with a -tuple \Weights of real-valued weights, inducing a weighted
distribution over the set of structures of size . We first adapt the
classical recursive random generation scheme into an algorithm taking
\bigO{n^{1+o(1)}+mn\log{n}} arithmetic operations to draw structures from
the \Weights-weighted distribution. Secondly, we address the analytical
computation of weights such that the targeted frequencies are achieved
asymptotically, i. e. for large values of . We derive systems of functional
equations whose resolution gives an explicit relationship between \Weights
and \TargFreq_1, \ldots, \TargFreq_k. Lastly, we give an algorithm in
\bigO{k n^4} for the inverse problem, {\it i.e.} computing the frequencies
associated with a given -tuple \Weights of weights, and an optimized
version in \bigO{k n^2} in the case of context-free languages. This allows
for a heuristic resolution of the weights/frequencies relationship suitable for
complex specifications. In the second alternative, the targeted distribution is
given by a natural numbers such that
where is the number of undistinguished atoms.
The structures must be generated uniformly among the set of structures of size
that contain {\em exactly} atoms \At_i (). We give
a \bigO{r^2\prod_{i=1}^k n_i^2 +m n k \log n} algorithm for generating
structures, which simplifies into a \bigO{r\prod_{i=1}^k n_i +m n} for
regular specifications
Tree model guided candidate generation for mining frequent subtrees from XML
Due to the inherent flexibilities in both structure and semantics, XML association rules mining faces few challenges, such as: a more complicated hierarchical data structure and ordered data context. Mining frequent patterns from XML documents can be recast as mining frequent tree structures from a database of XML documents. In this study, we model a database of XML documents as a database of rooted labeled ordered subtrees. In particular, we are mainly coneerned with mining frequent induced and embedded ordered subtrees. Our main contributions arc as follows. We describe our unique embedding list representation of the tree structure, which enables efficient implementation ofour Tree Model Guided (TMG) candidate generation. TMG is an optimal, non-redundant enumeration strategy which enumerates all the valid candidates that conform to the structural aspects of the data. We show through a mathematical model and experiments that TMG has better complexity compared to the commonly used join approach. In this paper, we propose two algorithms, MB3Miner and iMB3-Miner. MB3-Miner mines embedded subtrees. iMB3-Miner mines induced and/or embedded subtrees by using the maximum level of embedding constraint. Our experiments with both synthetic and real datasets against two well known algorithms for mining induced and embedded subtrees, demonstrate the effeetiveness and the efficiency of the proposed techniques
New data on the morphology of Sphenothallus Hall: implications for its affinities
Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/73676/1/j.1502-3931.1992.tb01378.x.pd
Automatic congestion detection in MPSoC programs using data mining on simulation traces
ISBN : 978-1-4673-2786-2International audienceThe efficient deployment of parallel software, specifically legacy one, on Multiprocessor systems on chip (MPSoC) is a challenging task. In this paper, we introduce the use of a data-mining approach on traces of a functionally correct program to automatically identify recurring congestion points and their sources. Each memory transaction, i.e. instruction fetch, data load and data store, occurring in the system is logged, thanks to the use of a virtual platform of the system. The resulting trace is analyzed to discover memory access patterns that are occurring frequently and that feature high latencies. These patterns are sorted by order of decreasing occurrence and estimated congestion level, allowing the easy identification of the sources of inefficiency. We have simulated a MPSoC with 16 processors running multiple applications, and have been able to automatically detect congestion on resources and their sources in the parallel program using this technique by analyzing gigabytes of traces
Raising the Dead; Extending Evolutionary Algorithms with a Case-based Memory
In dynamically changing environments, the performance of a standard evolutionary algorithm deteriorates. This is due to the fact that the population, which is considered to contain the history of the evolutionary process, does not contain enough information to allow the algorithm to react adequately to changes in the fitness landscape. Therefore, we added a simple, global case-based memory to the process to keep track of interesting historical events. Through the introduction of this memory and a storing and replacement scheme we were able to improve the reaction capabilities of an evolutionary algorithm with a periodically changing fitness function
BIOINFORMATICS APPLICATIONS NOTE GenRGenS: Software for Generating Random Genomic Sequences and Structures
Summary: GenRGenS is a software tool dedicated to randomly generating genomic sequences and structures. It handles several classes of models useful for sequence analysis, such as Markov chains, Hidden Markov models, weighted context-free grammars, regular expressions and PROSITE expressions. GenRGenS is the only program that can handle weighted context-free grammars, thus allowing the user to model and generate structured objects (such as RNA secondary structures) of any given desired size. GenRGenS also allows the user to combine several of these different models at the same time. Availability: Source and executable files of GenRGenS (in Java) and the complete user’s manual are freely available a
- …